The demo for SRBR_SMTSAworkshop

Gang Wu, Tanya Leise, and John Hogenesch

2016-05-17

1. The introduction about MetaCycle

1.1. General description about MetaCycle

The MetaCycle package is mainly used for detecting rhythmic signals from large scale time-series data. It incorporates ARSER(ARS), JTK_CYCLE(JTK), and Lomb-Scargle(LS) properly for periodic signal detection, and could also output integrated analysis results if required.

1.2. Types of time-series data

The usual time-series data is evenly sampled once at each time point, and the interval between neighbour time points is integer. Not all data are as simple as this. There are datasets with replicate samples, or with missing values, or un-evenly sampled, or sampled with a non-integer interval. Examples of these types of data are shown in the below table.

Data Type Point 1 Point 2 Point 3 Point 4 Point 5 Point 6
The usual data CT0 CT4 CT8 CT12 CT16 CT20
With missing value CT0 NA CT8 CT12 CT16 CT20
With replicates CT0 CT0 CT8 CT8 CT16 CT16
With un-even interval CT0 CT2 CT8 CT10 CT16 CT20
With non-integer interval CT0 CT4.5 CT9 CT13.5 CT18 CT22.5

Of course, some datasets may seem combination of two or more of above types of data.

Data Type Point 1 Point 2 Point 3 Point 4 Point 5 Point 6
With replicates and missing value CT0 CT0 CT8 NA CT16 CT16
With un-even interval and replicates CT0 CT2 CT2 CT10 CT16 CT20

1.3. Method selection

The meta2d function in MetaCycle is designed to analyze differnt types of time-series datasets, and it could automatically select proper method to analyze different types of input datasets. The implementation strategy used for meta2d is shown in the flow chart.

1.4. Integration strategies

In addition to selecting proper methods to analyze different kinds of datasets, meta2d could also output integrated results from multiple methods. Detail explaination about integration stragegies is in the vignettes of MetaCycle.

\[Y_i = B + TRE*(t_i - \frac{\sum_{i=1}^n t_i}{n}) + A*cos(2*\pi*\frac{t_i - PHA}{PER})\]

- B is baseline level of the time-series profile
- TRE is trend level of the time-series profile
- A is the amplitude of the waveform
- PHA and PER are integrated period and phase mentioned above. 
- The baseline and trend level are explained in the below example.

1.5. Description about output values

Column name Description Column name Description
ARS_pvalue pvalue from ARS LS_BH.Q FDR from LS
ARS_BH.Q FDR from ARS LS_period period from LS
ARS_period period from ARS LS_adjphase adjusted phase from LS
ARS_adjphase adjusted phase from ARS LS_amplitude amplitude from LS
ARS_amplitude amplitude from ARS meta2d_pvalue integrated pvalue
JTK_pvalue pvalue from JTK meta2d_BH.Q FDR based on integrated pvalue
JTK_BH.Q FDR from JTK meta2d_period averaged period
JTK_period period from JTK meta2d_phase integrated phase
JTK_adjphase adjusted phase from JTK meta2d_Base baseline value given by meta2d
JTK_amplitude amplitude from JTK meta2d_AMP amplitude given by meta2d
LS_pvalue pvalue from LS meta2d_rAMP relative amplitude

2. Take a quick check about installed software and packages

# load 'shiny' package
library("shiny")
# load 'MetaCycle' package
library("MetaCycle")
# load 'dplyr' package
library("dplyr")
# load 'ggplot2' package
library("ggplot2")
# load 'cowplot' package
library("cowplot")
# change working directory to the desktop 
setwd("~/Desktop")

# check required directories under the desktop
dir()

Try to open a ‘jar’ file

3. The demo pipeline about MetaCycle

3.1. Open the web application of meta2d

# load shiny package
library("shiny")

# use 'runGitHub' function of shiny package
runGitHub("MetaCycleApp", "gangwug")
# load shiny package
library("shiny")

# use 'runApp' function of shiny package
runApp("~/Desktop/MetaCycleApp-master")

3.2. Analyze three cases with the web application of meta2d

3.3. Please analyze five exercise datasets with MetaCycleApp according to instructions mentioned above

3.4. Analyze the experimental data (it may take several minutes)

3.5. Check the expression profiles of circadian transcripts (FDR < 0.05) and show their phase distribution

# load dplyr package
library("dplyr")

# read in the 'meta2d_experientA.csv' file
dataD <- read.csv("~/Desktop/SRBR_SMTSAworkshop-master/result/meta2d_experimentA.csv",
                   stringsAsFactors = FALSE)
# take a look at column names of the data
colnames(dataD)
##  [1] "CycID"         "JTK_pvalue"    "JTK_BH.Q"      "JTK_period"   
##  [5] "JTK_adjphase"  "JTK_amplitude" "LS_pvalue"     "LS_BH.Q"      
##  [9] "LS_period"     "LS_adjphase"   "LS_amplitude"  "meta2d_pvalue"
## [13] "meta2d_BH.Q"   "meta2d_period" "meta2d_phase"  "meta2d_Base"  
## [17] "meta2d_AMP"    "meta2d_rAMP"   "X18"           "X20"          
## [21] "X22"           "X24"           "X26"           "X28"          
## [25] "X30"           "X32"           "X34"           "X36"          
## [29] "X38"           "X40"           "X42"           "X44"          
## [33] "X46"           "X48"           "X50"           "X52"          
## [37] "X54"           "X56"           "X58"           "X60"          
## [41] "X62"           "X64"
# filter the data by meta2d_BH.Q < 0.05 by 'filter' function of dplyr
cirD <- filter(dataD, meta2d_BH.Q < 0.05)
# see the number of circadian transcripts
nrow(cirD)
## [1] 706
# select "CycID", "meta2d_period", "meta2d_phase", and "X18" to "X64" columns 
# for drawing heatmap by 'select' function of dplyr package
figureD <- select(cirD, CycID, meta2d_period, meta2d_phase, 
                   num_range("X", seq(18, 64, by=2), width = 2))
# load 'heatmapF' and 'phaseHist' function used to draw heatmap and histogram
source("~/Desktop/SRBR_SMTSAworkshop-master/R/fig.R")

# get the heatmap figure by 'heatmapF' function
heatmapFigure <- heatmapF(inputD = figureD, minfold=0.5, maxfold=2)

heatmapFigure

# get the phase distribtuion figure by 'phaseHist' function
histFigure <- phaseHist(inputD = figureD, binvalue=seq(0,24,by=2), histcol = "blue")

histFigure

4. The demo pipeline about PSEA

4.1. Prepare the file used for PSEA analysis

# take a look at column names of 'cirD' dataframe
colnames(cirD)
##  [1] "CycID"         "JTK_pvalue"    "JTK_BH.Q"      "JTK_period"   
##  [5] "JTK_adjphase"  "JTK_amplitude" "LS_pvalue"     "LS_BH.Q"      
##  [9] "LS_period"     "LS_adjphase"   "LS_amplitude"  "meta2d_pvalue"
## [13] "meta2d_BH.Q"   "meta2d_period" "meta2d_phase"  "meta2d_Base"  
## [17] "meta2d_AMP"    "meta2d_rAMP"   "X18"           "X20"          
## [21] "X22"           "X24"           "X26"           "X28"          
## [25] "X30"           "X32"           "X34"           "X36"          
## [29] "X38"           "X40"           "X42"           "X44"          
## [33] "X46"           "X48"           "X50"           "X52"          
## [37] "X54"           "X56"           "X58"           "X60"          
## [41] "X62"           "X64"
# if it reports error after typing above command, please re-run the code of preparing 
# 'figureD' in the first part of 3.5 section of this demo
# select its 'CycID', "meta2d_BH.Q", 'meta2d_phase' columns for later analysis
phaseD <- select(cirD, CycID, meta2d_BH.Q, meta2d_phase)

# read in the annotation file-'annoDbiMainInfMouse4302.txt' in the 'data-raw' directory
annoD <- read.delim("~/Desktop/SRBR_SMTSAworkshop-master/data-raw/Mouse4302ProbeAnnotation.txt", 
                    stringsAsFactors = FALSE)

# add annotation information with 'inner_join' function of dplyr package
phaseD <- inner_join(phaseD, annoD, by=c("CycID" = "PROBEID"))
# take a look at the phaseD
head(phaseD)
##        CycID  meta2d_BH.Q meta2d_phase ENTREZID  SYMBOL
## 1 1416273_at 6.267205e-04    0.9112647    21928 Tnfaip2
## 2 1418853_at 1.005625e-04    2.2190313    28194    Apon
## 3 1418322_at 1.267709e-03   21.7744648    12916    Crem
## 4 1435748_at 5.230269e-08   16.5923801    14544     Gda
## 5 1448993_at 3.079038e-02   16.0442955    67841    Atg3
## 6 1428889_at 4.627240e-02    4.8168048    69113  Alkbh3
# filter out those probesets without annotation information
phaseD <- filter(phaseD, SYMBOL != "NA")

# select 'SYMBOL', 'meta2d_BH.Q' and 'meta2d_phase' columns for 
# getting a dataframe without duplicate gene names
ori_phaseD <- select(phaseD, SYMBOL, meta2d_BH.Q, meta2d_phase)
# take a look at the row number with possilbe duplicate gene names
dim(ori_phaseD)
## [1] 698   3
# load the 'uniF' function from 'fig.R' for doing this work
source("~/Desktop/SRBR_SMTSAworkshop-master/R/fig.R")
# get a dataframe without duplicate gene names
uni_phaseD <- uniF(ori_phaseD)
# take a look at the rownumber now
dim(uni_phaseD)
## [1] 643   3
# select 'SYMBOL' and 'meta2d_phase' columns for PSEA analysis
pseaD <- select(uni_phaseD, SYMBOL, meta2d_phase)
# take a look at the pseaD
head(pseaD)
##          SYMBOL meta2d_phase
## 1 1110059E24Rik   15.0817948
## 2 1190002N15Rik   15.7701032
## 3 1700023H06Rik    6.0991388
## 4 1810055G02Rik   11.2992141
## 5 2310035C23Rik    1.6562627
## 6 2610507B11Rik    0.8431068
# write the 'pseaD' dataframe to a txt file-'experimentPSEA.txt' in 'result' directory
write.table(pseaD, file = "~/Desktop/SRBR_SMTSAworkshop-master/result/experimentPSEA.txt",
             sep = "\t", quote = FALSE, row.names = FALSE)

4.2. Convert the mouse gene name to its human homolog gene name

# read in the 'experimentPSEA.txt' file
ori_pseaD <- read.delim("~/Desktop/SRBR_SMTSAworkshop-master/result/experimentPSEA.txt", 
                         stringsAsFactors = FALSE)
# take a look at the data
head(ori_pseaD)
##          SYMBOL meta2d_phase
## 1 1110059E24Rik   15.0817948
## 2 1190002N15Rik   15.7701032
## 3 1700023H06Rik    6.0991388
## 4 1810055G02Rik   11.2992141
## 5 2310035C23Rik    1.6562627
## 6 2610507B11Rik    0.8431068
# read in the 'MouseHumanHomolog.txt' file in 'data-raw' directory of SRBR_SMTSAworkshop-master
homoD <- read.delim("~/Desktop/SRBR_SMTSAworkshop-master/data-raw/MouseHumanHomolog.txt", 
                     stringsAsFactors = FALSE)
# take a look at the data
head(homoD)
##   Mouse_gene Human_gene
## 1      Acadm      ACADM
## 2     Acadvl     ACADVL
## 3      Acat1      ACAT1
## 4      Acvr1      ACVR1
## 5       Sgca       SGCA
## 6       Adsl       ADSL
# join mouse gene and human homolog gene with 'inner_join' function
homo_pseaD <- inner_join(ori_pseaD, homoD, by=c("SYMBOL" = "Mouse_gene"))
# take a look at the joined dataframe
head(homo_pseaD)
##          SYMBOL meta2d_phase Human_gene
## 1 1110059E24Rik   15.0817948    C9orf85
## 2 1190002N15Rik   15.7701032    C3orf58
## 3 1810055G02Rik   11.2992141   C11orf24
## 4 2310035C23Rik    1.6562627   KIAA1468
## 5 2610507B11Rik    0.8431068   KIAA0100
## 6 2700094K13Rik    9.9926117   C11orf31
# select "Human_gene" and "meta2d_phase" columns for PSEA analysis
outD <- select(homo_pseaD, Human_gene, meta2d_phase)
# take a look at the selected data
head(outD)
##   Human_gene meta2d_phase
## 1    C9orf85   15.0817948
## 2    C3orf58   15.7701032
## 3   C11orf24   11.2992141
## 4   KIAA1468    1.6562627
## 5   KIAA0100    0.8431068
## 6   C11orf31    9.9926117
# write it to a file named-'human_experimentPSEA.txt' in 'result' directory of SRBR_SMTSAworkshop-master
write.table(outD, file = "~/Desktop/SRBR_SMTSAworkshop-master/result/human_experimentPSEA.txt", 
              sep = "\t", quote = FALSE, row.names = FALSE, col.names = FALSE)

4.3. Do PSEA analysis with ‘PSEA1.1_VectorGraphics.jar’

5. Try your data or share your experience of analyzing time-series data